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1  Introduction 


Image  processing,  a  traditionally  engineering  field,  has  attracted  the  attention  of 
many  mathematicians  during  the  past  two  decades.  From  the  vision  and  cognitive 
science  point  of  view,  image  processing  is  a  basic  tool  used  to  reconstruct  the  rela¬ 
tive  order,  geometry,  topology,  patterns,  and  dynamics  of  the  3-D  world  from  2-D 
images.  Therefore,  it  cannot  be  merely  a  historic  coincidence  that  mathematics 
must  meet  image  processing  in  this  digital  technology  era. 

The  role  of  mathematics  is  also  determined  by  the  broad  range  of  applica¬ 
tions  of  image  processing  in  contemporary  science  and  technology.  These  in¬ 
clude  astronomy  and  aerospace  exploration,  medical  imaging,  molecular  imaging, 
computer  graphics,  human  and  machine  vision,  telecommunication,  auto-piloting, 
surveillance  video,  and  biometric  security  identification  (such  as  fingerprints  and 
face  identification),  etc.  All  these  highly  diversified  disciplines  have  made  it  nec¬ 
essary  to  develop  the  common  mathematical  foundation  and  frameworks  for  im¬ 
age  analysis  and  processing.  Mathematics  at  all  levels  must  be  introduced  to  meet 
the  crucial  qualities  demanded  by  this  new  era  -  genericity,  well-posedness,  accu¬ 
racy,  and  computational  efficiency,  just  to  name  a  few.  In  return,  image  processing 
has  created  tremendous  opportunities  for  mathematical  modeling,  analysis,  and 
computation. 

In  this  article,  we  intend  to  give  a  broad  picture  of  mathematical  image  pro¬ 
cessing  through  one  of  the  most  recent  and  very  successful  approaches  -  the  varia¬ 
tional  PDE  method.  We  first  discuss  two  crucial  ingredients  for  image  processing: 
image  modeling  or  representation,  and  processor  modeling.  We  then  focus  on 
the  variational  PDE  method.  The  backbone  of  the  article  consists  of  two  major 
problems  in  image  processing  -  inpainting  and  segmentation,  which  we  have  per¬ 
sonally  worked  on,  but  by  no  means  do  we  intend  to  have  a  comprehensive  review 
of  the  entire  field  of  image  processing. 

1.1  Image  processing  as  an  input-output  system 

Directly  connected  to  image  processing  are  the  two  dual  fields  in  the  contempo¬ 
rary  computer  science  -  computer  vision  and  computer  graphics.  Vision  (whether 
machine  or  human)  is  to  reconstruct  the  3-D  world  from  the  observed  2-D  images, 
while  graphics  pursues  the  opposite  direction  in  designing  suitable  2-D  scene  im¬ 
ages  to  simulate  our  3-D  world.  Image  processing  is  the  crucial  middle  way  con¬ 
necting  the  two. 
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Most  abstractly,  image  processing  can  be  considered  as  an  input-output  system 


Qo 


Image  Processor  T 


Q 


Here  T  denotes  a  typical  image  processor,  for  example,  denoising,  deblurring, 
segmentation,  compression,  or  inpainting.  The  input  data  Q0  can  represent  an 
observed  or  measured  single  image  or  image  sequence,  and  the  output  Q  = 
(<7i)  <?2,  *  ■  ■ )  contains  all  the  targeted  image  features. 

For  example,  the  human  vision  system  can  be  considered  as  a  highly  involved 
multi-level  image  processor  T.  Qo  represents  the  image  sequence  that  is  con¬ 
stantly  projected  onto  the  retina.  The  output  vector  Q  contains  all  the  major  fea¬ 
tures  that  are  important  to  our  daily  life,  from  the  low-level  ones  such  as  relative 
orders,  shapes,  and  grouping  rules,  to  high-level  feature  parameters  that  help  clas¬ 
sify  or  identify  various  patterns  and  objects. 

Listed  in  Table  1  are  some  typical  image  processing  problems. 

The  two  main  ingredients  of  image  processing  are  the  input  Q0  and  the  proces¬ 
sor  T-  As  a  result,  the  two  key  issues  that  have  been  driving  the  entire  mainstream 
mathematical  research  on  image  processing  are  (a)  the  modeling  and  representa¬ 
tion  of  the  input  visual  data  Q0,  and  (b)  the  modeling  of  the  processing  operators 
T.  The  two  are  independent  but  also  closely  connected  to  each  other  by  the  uni¬ 
versal  rule  in  mathematics:  the  structure  and  performance  of  an  operator  T  is 
greatly  influenced  by  how  the  input  class  of  functions  are  modeled  or  represented. 


1.2  Image  modeling  and  representation 

To  efficiently  handle  and  process  images,  first  we  need  to  understand  what  images 
really  are  mathematically  and  how  to  represent  them.  For  example,  is  it  adequate 
to  treat  them  as  general  L2  functions,  or  a  subset  of  L2  with  suitable  regularity 
constraints?  Among  the  various  approaches,  here  we  briefly  outline  three  major 
classes  of  image  modeling  and  representation. 

Random  fields  modeling.  An  observed  image  u0  is  modeled  as  the  sampling 
of  a  random  field.  For  example,  the  Ising  spin  model  in  statistical  mechanics  can 
be  used  to  model  binary  images.  More  generally,  images  are  modeled  by  some 
Gibbs/Markovian  random  fields  [29,  57].  The  statistical  properties  of  the  fields 
are  often  established  through  the  filtering  technique  and  learning  theory.  Random 
field  modeling  is  the  most  ideal  approach  for  describing  natural  images  with  rich 
texture  patterns  such  as  trees  and  mountains. 

Wavelets  representation.  Whether  digitally  or  biologically,  an  image  is  often 
acquired  from  the  responses  of  a  collection  of  micro  sensors  (or  photo  receptors). 
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T 

Qo 

Q 

denoising + deblurring 

uq  =  Ku  +  n 

clean  &  sharp  u 

inpainting 

uo\ci\d 

entire  image  u\q 

segmenation 

u0 

“objects” 

\uk:  0/ c] 5  ^  —  1,2... 

scale-space 

u0 

multiscale  images 

("U^l  ,  U\ 2,  ■■•  ) 

motion  estimation 

optical  flows 

(fjf1),  v^2\ ...) 

Table  1:  Typical  image  processors  and  their  inputs  and  outputs.  The  symbols  rep¬ 
resent  (1)  I\:  a  blurring  kernel,  and  n:  an  additive  noise,  both  assumed  to  be  linear 
for  simplicity  in  the  current  paper;  (2)  u0:  the  given  noisy  or  blurred  image;  (3) 
f2:  the  entire  image  domain,  and  D\  a  subset  where  image  information  is  missing 
or  unaccessible;  (4)  [uk ,  Qk]:  Qks  are  the  segmented  individual  “objects,”  while 
uk  s  are  their  intensity  values;  (5)  \ks  are  different  scales,  and  u\  can  be  roughly 
understood  as  the  projection  of  the  input  image  at  scale  A;  (6)  Uq^’s  denote  the 
discrete  sampling  of  a  continuous  “movie”  uo(x,  t )  (with  some  small  time  step  h), 
yin)  ’s  are  the  estimated  optical  flows  (i.e.  velocity  fields)  at  each  moment. 


During  the  past  two  decades,  it  has  been  gradually  realized  or  experimentally  sup¬ 
ported  that  such  local  responses  can  be  well  approximated  by  wavelets.  Wavelets 
as  a  new  representation  tool  has  revolutionized  our  notion  of  images  and  their 
multiscale  structures  [25,  33].  The  new  JPEG2000  protocol  and  the  successful 
compression  of  the  FBI  fingerprints  database  are  its  two  most  influential  applica¬ 
tions.  The  theory  is  still  being  actively  pushed  forward  by  the  new  generation  of 
geometric  wavelets  such  as  curvelets  [7]  and  beamlets  [42]. 

Regularity  spaces.  In  the  linear  filtering  theory  of  conventional  digital  image 
processing,  an  image  u  is  considered  to  be  in  the  Sobolev  space  if1(D).  Sobolev 
model  works  well  for  homogeneous  regions  but  is  insufficient  as  a  global  image 
model,  since  it  “smears”  the  most  important  visual  cue  -  edges  [34].  Two  well 
known  models  have  been  introduced  to  legalize  the  existence  of  edges.  One  is 
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Mumford  and  Shah’s  “object-edge”  model  [39],  and  the  other  is  Rudin,  Osher,  and 
Fatemi’s  BV  image  model  [43].  The  object-edge  model  assumes  that  an  ideal  im¬ 
age  u  consists  of  disjoint  homogeneous  object  patches  [uk:  Qk]  with  uk  €  H1{ Qk) 
and  regular  boundaries  dQk  (characterized  by  the  1 -dimensional  Hausdorff  mea¬ 
sure).  The  BV  image  model  assumes  that  an  ideal  image  has  bounded  total  varia¬ 
tion  fn  Du ! .  All  these  regularity  based  image  models  are  generally  applicable  to 
images  with  low  texture  patterns  and  without  rapidly  oscillatory  components  [36]. 

1.3  Modeling  of  image  processors 

How  images  are  modeled  and  represented  very  much  determines  the  way  we 
model  image  processors.  We  shall  illustrate  this  viewpoint  through  the  exam¬ 
ple  of  denoising  u  =  Tu0:  u0  =  u  +  n,  assuming  for  simplicity  that  the  white 
noise  n  is  additive  and  homogeneous,  and  there  is  no  blurring  involved. 

When  images  are  represented  by  wavelets,  the  denoising  processor  T  is  in 
some  sense  “diagonalized,”  and  equivalent  to  a  simple  engineering  on  the  individ¬ 
ual  wavelet  components.  This  is  the  celebrated  results  of  Donoho  and  Johnstone 
on  the  thresholding  based  denoising  schemes  [27]. 

Under  the  statistical/random  field  modeling  of  images,  the  denoising  processor 
T  becomes  the  MAP  ( Maximum  A  Posteriori )  estimation.  By  Bayes’  formula,  the 
posterior  probability  given  an  observation  it0  is 

p(u\u0)  =  p(u0\u)p(u)/p(u0). 

The  denoising  processor  T  is  achieved  by  solving  the  MAP  problem  maxp(u  j  ?./,o). 

U 

Therefore,  besides  the  random  field  image  model  p(u),  it  is  also  important  to  know 
the  mechanism  by  which  ?j0  is  generated  from  the  ideal  image  u  (or  the  so  called 
generative  data  model).  The  two  are  crucial  for  successfully  carrying  out  Bayesian 
denoising. 

Finally,  if  the  ideal  image  u  is  modeled  as  an  element  in  certain  regular  func¬ 
tion  spaces  such  as  or  BV(H),  then  the  denoising  processor  T  can  be 

realized  by  a  variational  optimization.  For  instance,  by  Rudin-Osher-Fatemi’s  BV 
image  model,  T  is  achieved  by 

min  [  \Du\  subject  to  -  [  (u  —  u0)2 dx  <  o2 , 

“  Jn  l“l  Jn 

where  the  white  noise  is  assumed  to  be  well  approximated  by  Gaussian  iV(0,  a2). 
This  is  the  well  known  denoising  model  first  proposed  by  Rudin,  Osher,  and 
Fatemi,  and  belongs  to  the  more  general  class  of  regularized  data  fitting  models. 
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Like  using  different  coordinate  systems  to  describe  a  single  physical  object, 
the  different  formulations  of  a  same  image  processor  are  closely  interconnected. 
Again  take  de noising  for  example.  It  has  been  shown  that  the  wavelet  technique  is 
equivalent  to  an  approximate  optimal  regularization  in  certain  Besov  spaces  [23]. 
On  the  other  hand,  Bayesian  processing  and  the  regularity  based  variational  ap¬ 
proach  can  also  be  connected  (at  least  formally)  by  the  Gibbs’  formula  in  statisti¬ 
cal  mechanics  [30]  (see  the  next  section). 

1.4  Variational  PDE  method 

Having  briefly  introduced  the  general  picture  of  mathematical  image  processing, 
we  now  focus  on  the  variational  PDE  method  through  two  processors:  inpainting 
and  segmentation. 

For  the  history  and  a  detailed  description  of  the  current  developments  of  the 
variational  and  PDE  method  in  image  and  vision  analysis,  we  refer  to  the  two 
special  issues  in  IEEE  Trans.  Image  Processing  [7(3),  1998]  and  J.  Visual  Comm. 
Image  Rep.  [13(1/2),  2002],  and  also  two  recent  monographs  [2,  48]. 

In  the  variational  or  “energy”  based  models,  nonlinear  PDEs  emerge  as  one 
derives  their  formal  Euler-Lagrange  equations,  or  tries  to  locate  the  local  or  global 
minima  by  the  gradient  descent  method.  Some  of  the  PDEs  can  be  studied  by  the 
viscosity  solution  approach  [24],  while  many  others  still  remain  open  to  further 
theoretical  investigation. 

Compared  with  other  approaches,  the  variational  and  PDE  method  has  remark¬ 
able  advantages  in  both  theory  and  computation.  First,  it  allows  to  directly  handle 
and  process  visually  important  geometric  features  such  as  gradients,  tangents,  cur¬ 
vatures,  and  level  sets.  It  can  also  effectively  simulate  several  visually  meaningful 
dynamic  processes  such  as  linear  and  nonlinear  diffusions,  and  the  information 
transport  mechanism.  Secondly,  in  terms  of  computation,  it  can  profoundly  ben¬ 
efit  from  the  existing  wealthy  literature  of  numerical  analysis  and  computational 
PDEs.  For  example,  various  well  designed  shock  capturing  schemes  in  Compu¬ 
tational  Fluid  Dynamics  (CFD)  can  be  conveniently  adapted  to  edge  computation 
in  images. 


2  Variational  Image  Inpainting  and  Interpolation 

The  word  inpainting  is  an  artistic  synonym  for  image  interpolation,  as  initially 
circulated  among  museum  restoration  artists  who  manually  restore  cracked  an- 
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cient  paintings.  The  concept  of  digital  inpainting  was  first  introduced  into  digital 
image  processing  in  the  paper  by  Bertalmio  et  al.  [5].  Currently,  digital  inpainting 
techniques  have  found  broad  applications  in  image  processing,  vision  analysis, 
and  digital  technologies,  such  as  image  restoration,  disocclusion,  perceptual  im¬ 
age  coding,  zooming  and  image  super-resolution,  error  concealment  in  wireless 
image  transmission,  and  so  on  [15,  16,  17,  18].  See  for  instance  Figure  1  for  the 
application  in  error  concealment. 

We  now  discuss  the  mathematical  ideas  and  methodologies  behind  the  varia¬ 
tional  inpainting  techniques.  Throughout  this  section,  u  denotes  the  original  com¬ 
plete  image  on  a  2-D  domain  Q,  and  u0  the  observed  or  measured  portion  of  u  on 
a  subdomain  or  general  subset  D,  which  can  be  either  noisy  or  blurry.  The  goal  of 
inpainting  is  to  recover  u  on  the  entire  image  domain  Q  as  faithfully  as  possible 
from  the  available  data  u0  on  D. 

2.1  From  Shannon’s  Theorem  to  variational  inpainting 

Interpolation  is  a  classical  topic  in  approximation  theory,  numerical  analysis,  and 
signal  and  image  processing.  Successful  interpolants  include  polynomials,  har¬ 
monic  waves,  radially  symmetric  functions,  finite  elements,  splines,  wavelets,  etc. 
Despite  the  diversity  of  the  literature,  there  indeed  exists  one  most  widely  recog¬ 
nized  result  due  to  Shannon  [50],  known  as  Shannon 's  Sampling  Theorem. 

Theorem  1  (Shannon’s  Theorem)  If  a  signal  u(t )  is  bandlimited  within  (— uj,  uS), 
then, 

OO 

ii(t)  =  u  (n— sine  t  —  n^j  . 

n—  — oo 


That  is,  if  an  analog  signal  u(t)  (with  finite  energy,  or  equivalently,  in  L2(IR )) 
does  not  contain  any  high  frequencies,  then  it  can  be  perfectly  interpolated  from 
its  properly  sampled  discrete  sequence  u0[n]  =  u(nir/cu)  (with  lo/tt  known  as  the 
Nyquist  frequency). 

All  interpolation  problems  share  this  “if-then”  structure.  “If’  specifies  the 
space  where  the  target  signal  u  is  to  be  looked  for,  while  “then”  gives  the  re¬ 
construction  or  interpolation  procedure  based  on  the  discrete  samples  (or  more 
generally,  any  partial  information  about  the  signal). 

Unfortunately,  for  most  real  applications  in  signal  and  image  processing,  one 
cannot  expect  a  closed-form  formula  as  clean  as  Shannon’s.  This  is  at  least  due 
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to  two  factors.  First,  signals  like  images  are  intrinsically  not  bandlimited  because 
of  the  presence  of  edges  (or  Heaviside  type  singularities)  for  vision  analysis  and 
communication  [34].  Secondly,  for  most  real  applications,  the  given  incomplete 
data  are  often  noisy  and  even  blurred  during  the  imaging  or  transimission  pro¬ 
cesses.  Therefore,  in  the  criterion  of  Shannon’s  Theorem,  we  are  dealing  with  a 
class  of  “bad”  signals  u  with  “unreliable”  samples  u0. 

Naturally,  for  image  inpainting,  both  the  “if’  and  “then”  statements  in  Shan¬ 
non’s  Theorem  need  to  be  modeled  carefully.  It  turns  out  that  there  are  two  pow¬ 
erful  and  interdependent  frameworks  that  can  carry  out  this  task:  one  is  the  varia¬ 
tional  method,  and  the  other,  the  Bayesian  framework  [29]. 

In  the  Bayesian  approach,  the  “if ’-statement  specifies  both  the  so-called  prior 
model  and  the  data  model.  The  prior  model  specifies  how  a  priori  images  are 
distributed,  or  equivalently,  which  images  occur  more  frequently  than  the  others. 
Probabilistically,  it  specifies  the  prior  probability  p(u).  Let  it0  denote  the  incom¬ 
plete  data  that  are  observed,  measured,  or  sampled.  Then  the  second  part  of  “if’  is 
to  model  how  u0  is  generated  from  u:  u  — »■  ?/0,  or  to  specify  the  conditional  prob¬ 
ability  p(uo\u).  Finally,  in  the  Bayesian  framework,  Shannon’s  “then”-statement 
is  replaced  by  the  Maximum  A  Posteriori  (MAP)  optimization: 

max  p(u\u0)  =  p(u0\u)p(u) /p(u0),  (1) 

U 

where  we  have  spelled  out  Bayes’  formula.  (It  is  also  equivalent  to  maximizing 
the  product  of  the  prior  model  and  data  model,  since  the  denominator  is  a  fixed 
normalization  constant  once  uq  is  given.)  To  summarize,  Bayesian  inpainting 
is  to  find  the  most  probable  image  given  its  incomplete  and  possibly  distorted 
observation. 

The  variational  approach  resembles  the  Bayesian  methodology,  only  now  ev¬ 
erything  is  expressed  deterministically.  The  Bayesian  prior  model  p(u)  becomes 
the  specification  of  the  regularity  of  an  image  u,  while  the  data  model  p(u0\u) 
now  measures  how  well  the  observation  w0  is  fitted  if  the  original  image  is  in¬ 
deed  u.  Regularity  is  enforced  through  “energy”  functionals,  for  example,  the 

Sobolev  norm  E[u]  =  /  |Vit  2dx,  the  total  variation  model  of  Rudin,  Osher, 

Jn 

and  Fatemi  [31,  43]  E[u\  =  /  \Du\,  and  the  Mumford-Shah  free-boundary 

Jn 

model  [39]  E[u,T]  =  /  \Vu\2dx  +  with  H1  denoting  the  1-D  Haus- 

Jn\r 

dorff  measure.  The  quality  of  data  fitting  u  — »■  uq  is  often  judged  by  an  error 
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measure  E[u0\u\.  For  instance,  the  least  square  measure  [51]  prevails  in  the  liter¬ 
ature  due  to  the  genericity  of  Gaussian  type  noises  and  the  Central  Limit  Theorem : 

E[u0\u\  =  — —  /  (Tu  —  u0)2dx,  where  D  is  the  domain  on  which  u0  has  been 
\-L'\  J D 

sampled  or  measured,  \D\  its  area  or  cardinality  for  the  discrete  case,  and  T  de¬ 
notes  any  linear  or  nonlinear  image  processor  (such  as  blurring  and  diffusion).  In 
this  variational  setting,  Shannon’s  “then”-statement  becomes  a  constrained  opti¬ 
mization  problem: 

min E[u\  overall  u  :  E[u0\u\  <  o2. 

Here  o2  denotes  the  variance  of  the  white  noise,  which  is  assumed  to  be  known 
by  proper  statistical  estimators.  Equivalently,  the  model  solves  the  following  un¬ 
constrained  problem  using  Lagrange  multiplier:  A  [9], 

min  E  [it]  +  XE[uo\u\.  (2) 

U 

Generally,  A  expresses  the  balance  between  regularity  and  fitting.  In  summary, 
variational  inpainting  is  to  search  for  the  most  “regular”  image  that  best  fits  the 
given  observation. 

The  Bayesian  approach  is  more  universal  in  the  sense  of  allowing  general  sta¬ 
tistical  prior  and  data  models,  and  is  powerful  for  restoring  both  artificial  images 
and  natural  images  (or  textures).  But  the  learning  of  the  prior  model  and  the  data 
model  is  usually  quite  expensive.  The  variational  approach  is  ideal  for  dealing 
with  regularity  and  geometry,  but  tends  to  work  best  for  man-made  indoor  and 
outdoor  scenes,  or  images  with  low  textures.  The  two  approaches  (1)  and  (2)  can 
be  at  least  formally  unified  under  Gibbs’  formula  in  statistical  mechanics  [30]: 

E[-]  oc  — /31og p(-),  or  p(-)  oc  e~E^^7  (3) 

where  (3  =  kT  is  the  product  of  the  Boltzmann  constant  and  temperature,  and  oc 
means  equality  up  to  a  multiplicative  or  additive  constant.  (However,  the  defin¬ 
ability  of  a  rigorous  probability  measure  over  “all”  images  is  highly  non-trivial 
because  of  the  multi-scale  nature  of  images.  Recent  efforts  can  be  found  in  Mum- 
ford  and  Gidas  [38].) 

2.2  Variational  inpainting  based  on  geometric  image  models 

In  a  typical  image  inpainting  problem,  uq  denotes  the  observed  or  measured  in¬ 
complete  portion  of  a  clean  “good”  image  u  on  the  entire  image  domain  H.  A 
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simplified  but  already  very  powerful  data  model  in  various  digital  applications  is 
blurring  followed  by  noise  degradation  and  spatial  restriction: 

u0\D  =  (. Ku  +  n)D, 


where  K  is  a  continuous  blurring  kernel,  often  assumed  to  be  linear  or  even  shift- 
invariant,  and  n,  an  additive  white  noise  field  assumed  to  be  close  to  Gaussian 
for  simplicity.  u0\n\D  is  missing  or  inaccessible.  The  goal  of  inpainting  is  to 
reconstruct  u  as  faithfully  as  possible  from  uq\d.  The  data  model  is  explicitly 
given  by 

E[uq\u,  D]  =  — !—  j  (Ku  —  u0)2dx.  (4) 

D  J  D 


Therefore,  from  the  variational  point  of  view,  the  quality  of  an  inpainting  model 
crucially  depends  on  the  prior  model  or  the  regularity  energy  E[u\. 


The  TV  prior  model  E[u\ 


I 


Du\  was  first  introduced  into  image  process¬ 


ing  by  Rudin,  Osher,  Fatemi  in  [43].  Unlike  the  Sobolev  image  model  E2[u\  = 
/  V?/  2dx,  the  TV  model  legalizes  one  of  the  most  important  vision  features  - 

J  n 

the  “edges.”  For  example,  for  a  cartoon  image  u  showing  the  night  sky  (u  =  0) 
with  a  full  bright  moon  (u  =  1),  the  Sobolev  energy  blows  up,  while  the  TV  en¬ 


/ 


ergy  /  \Du\  =  the  perimeter  of  the  moon,  which  is  finite.  Therefore,  in  combi¬ 


nation  with  the  data  model  (4),  the  variational  TV  inpainting  model  is  to  minimize 


Etv[u\u0,  D\  =  a  /  \Du\  +  A  /  ( Ku  —  u0)2dx .  (5) 

Jn  J  d 

The  admissible  space  is  BV(fl),  the  Banach  space  of  all  functions  with  bounded 
variation  [31].  We  observe  that  it  is  very  similar  to  the  celebrated  TV  restoration 
model  of  Rudin,  Osher,  and  Fatemi  [43].  In  fact,  the  beauty  and  power  exactly 
lie  in  that  the  model  provides  a  unified  framework  for  denoising,  deblurring,  and 
image  reconstruction  from  incomplete  data.  Figure  1  displays  the  computational 
output  of  the  model  as  applied  to  the  error  concealment  of  a  blurry  image  with 
simulated  random  packet  loss  due  to  the  transmission  failure  of  a  network. 

The  second  well-known  prior  model  is  Mumford-Shah’s  object-edge  model  [39] . 
Unlike  TV,  the  edge  set  T  is  now  explicitly  singled  out,  and  an  image  u  is  under¬ 
stood  as  the  combination  of  both  the  geometric  feature  T  and  the  piecewise  smooth 
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Figure  1 :  TV  inpainting  for  the  error  concealment  of  a  blurry  image. 

“objects”  Ui  s  on  all  the  connected  components  fV s  of  \  T.  Thus  in  both  the 
Bayesian  and  variational  languages,  the  prior  model  consists  of  two  parts  (see  (3)): 

p(u,  T)  =  p(rt|r)p(T)  and 
E[u,T]  =  E[u\T]  +  E[T]. 

In  the  Mumford-Shah  model,  the  edge  regularity  is  specified  by  i?[r]  =  ff1(T), 
the  1-D  Hausdorff  measure,  or  as  in  most  computational  applications,  £’[F]  = 
length(T)  assuming  that  T  is  Lipschitz.  The  smoothness  of  the  “objects”  is  nat¬ 
urally  characterized  by  the  ordinary  Sobolev  norm:  ^[w-jr]  =  /  \Vu\2dx. 

Jn\r 

Therefore,  in  combination  with  the  data  model  (4),  the  variational  inpainting 
model  based  on  the  Mumford-Shah  prior  is  given  by 

inf  £’ms['u,rj'u0,  D]  = 

u,T 

r  f  (6) 

a  /  \Vu\2dx  +  PH\T)  +  A  /  (Ku  -  u0)2dx. 

Jn\r  Jd 

Figure  2  shows  one  application  of  this  model  for  text  removal  [28].  Note  edges 
are  preserved  and  smooth  regions  remain  smooth. 

Numerous  applications  have  demonstrated  that,  for  classical  applications  in 
denoising,  deblurring  or  segmentation,  both  the  TV  and  Mumford-Shah  models 
perform  sufficiently  well  even  by  the  high  standard  of  human  vision.  But  inpaint¬ 
ing  does  have  its  special  identity.  We  have  demonstrated  in  [15,  28]  that  for 
large-scale  inpainting  problems,  high  order  image  models  which  incorporate  the 
curvature  information  become  necessary  for  more  faithful  visual  effects. 
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Inpainting  output 


Image  to  be  inpainted 


e  Penguin 

guys 

'tiiat%ft-jnany 
de  a 


Hmage 

^tionToS 
IsTtrue?  We 
^agree.  We  dre 
!Tore  optimisti^The  g 


Inpainting  domain  (or  mask) 

Hello!  We  are  Penguin 
A  and  B.  You  guys 
must  think  that  so  many 
words  have  made  a 
large  amount  of  image 
information  lost. 

Is  this  true?  We 
disagree.  We  are 
more  optimistic.  The 


Figure  2:  Mumford-Shah  inpainting  for  text  removal. 


The  key  to  high  order  geometric  image  models  is  Euler’s  elastica  curve  model: 

e[l\—  /( a  +  bK,2)ds ,  a,b>  0, 

J  i 

where  k  denotes  the  scalar  curvature.  Birkhoff  and  de  Boor  [6]  called  it  the  “non¬ 
linear  spline”  model  in  approximation  theory.  It  was  first  introduced  into  computer 
vision  by  Mumford  [37].  Unlike  straight  lines  (for  which  b  =  0),  the  elastica 
model  allows  smooth  curves  because  of  the  curvature  term,  which  is  important  for 
computer  vision  and  computer  graphics. 

By  imposing  the  elastica  energy  on  each  individual  level  lines  of  u  (at  least 
symbolically  or  by  assuming  that  u  is  regular  enough),  we  obtain  the  so-called 
elastica  image  model: 


Ee.  1  \U 


/oo 

e[u  =  X]dX 

•oo 

/oo  n 

/  (a  +  bn2)dsdX 

■oo  J u=X 

=  (a  +  bK2)\Vu\dx. 

Jn 


(7) 


In  the  last  integrand,  the  curvature  is  given  by  k  =  V  •  [Vit/|  Vit|].  (Notice  that 
in  the  absence  of  the  curvature  term,  the  above  formula  is  exactly  the  co-area 
formula  for  smooth  functions  [31].)  This  elastica  prior  model  was  first  studied 
for  inpainting  by  Masnou  and  Morel  [35],  and  Chan,  Kang,  and  Shen  [11],  and  it 
improves  the  TV  inpainting  model  as  expected. 

Similarly,  the  Mumford-Shah  image  model  Ems  can  also  be  improved  by  hav- 
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ing  the  length  energy  replaced  by  Euler’s  elastica  energy: 


Emse[u,  T]  =  a  /  j Vu\2dx  +  e[r], 

Jn\r 

It  was  first  applied  to  image  inpainting  by  Esedoglu  and  Shen  [28].  Figure  3  shows 
one  example  of  applying  this  image  prior  model  to  the  inpainting  of  an  occluded 
disk.  Both  the  TV  and  Mumford-Shah  inpainting  models  would  complete  the 
interpolation  with  a  straight  line  edge  and  introduce  visible  corners  a  result.  The 
elastica  model  restores  the  smooth  boundary. 


Figure  3:  Smooth  inpainting  by  the  Mumford-Shah-Euler  model. 

The  improved  performance  of  curvature  based  models  comes  at  a  price,  both 
in  terms  of  theory  and  computation.  The  existence  and  uniqueness  of  the  TV  and 
Mumford-Shah  inpainting  models  can  be  studied  in  a  fashion  similar  to  the  classi¬ 
cal  restoration  and  segmentation  problems.  But  theoretical  study  on  the  high  order 
models  is  only  in  the  very  beginning.  The  difficulty  lies  in  the  involvement  of  the 
second  order  geometric  feature  -  curvature,  and  the  identification  of  a  proper  func¬ 
tion  space  to  study  the  models.  Secondly  in  terms  of  computation,  the  calculus 
of  variation  on  the  curvature  term  leads  to  fourth-order  highly  nonlinear  PDEs, 
whose  fast  and  efficient  numerical  solution  imposes  a  tremendous  challenge. 

Let  us  conclude  this  section  with  a  brief  discussion  on  computation,  especially 
for  the  TV  and  Mumford-Shah  inpaintings. 

For  the  TV  inpainting  model  Etv.  the  Euler-Lagrange  equation  is  formally  (or 
assuming  that  u  is  in  the  Sobolev  space  IF1,1)  given  by 

+  hK*xd{Ku  -  u0)  =  0.  (8) 
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Here  K*  denotes  the  adjoint  of  the  linear  blurring  kernel  K.  the  multiplier  xd(x) 
the  indicator  of  D ,  and  fj,  =  2X/a.  The  boundary  condition  along  is  Neumann 
adiabatic,  to  eliminate  any  boundary  contribution  during  the  integration-by-parts 
process.  This  nonlinear  PDE  can  be  solved  iteratively  by  the  freezing  technique: 
let  ul'n)  denote  the  current  inpainting  at  step  n.  then  the  updated  inpainting  w(n+1) 
solves  the  linearized  PDE 

v  '  [^^J  +  uK-Xd(Ku^»  -  „„)  =  0. 

In  practice,  the  intermediate  diffusivity  coefficient  1/|Vm^|  is  often  modified  to 
1  / y/|  |2  +  e2  for  some  small  conditioning  parameter  e,  or  by  the  mandatory 

ceiling  and  flooring  between  e  and  1/e.  The  convergence  of  such  algorithms  have 
been  well  studied  in  the  literature  [9,  26].  There  are  also  many  other  techniques 
possible  for  solving  (8)  in  the  literature,  for  example,  see  Vogel  and  Oman  [54], 
and  Chan,  Mulet,  and  Golub  [10].  We  only  need  to  relate  (8)  to  the  conventional 
TV  restoration  case. 

The  computation  of  the  Mumford-Shah  inpainting  model  is  also  very  inter¬ 
esting.  Unlike  segmentation,  for  inpainting,  one’s  direct  interest  is  only  in  u,  not 
T.  Such  understanding  makes  the  T-convergence  approximation  theory  perfect 
for  inpainting.  According  to  Ambrosio  and  Tortorelli  [1],  by  introducing  an  edge 

signature  function  z (x)  G  [0,1],  x  €  H,  and  having  E [w|T]  =  a  {Vu'fdx 

Jn\r 

replaced  by  -E[w|z]  =  a  I  z2\Vu\2dx,  one  can  approximate  the  length  energy  in 

Jn 

the  Mumford-Shah  model  by  a  quadratic  integral  in  z  (up  to  a  constant  multiplier): 

r  i  off e|V^|2  (z  —  l)2 \ 

Thus  the  Mumford-Shah  inpainting  model  is  approximated  by 

Ee[u,  z\uq,  D]  =  E'fulz]  +  Ee[z]  +  XE[u0\u,  D ], 

which  is  a  quadratic  integral  in  both  u  and  z\  It  leads  to  a  coupled  system  of  linear 
elliptic  type  of  PDE’s  in  both  u  and  the  edge  signature  z,  which  can  be  solved 
efficiently  using  any  numerical  elliptic  solver.  The  example  in  Figure  2  has  been 
computed  by  this  scheme  [28]. 

Finally  we  should  also  mention  some  of  the  major  applications  of  the  inpaint¬ 
ing  and  geometric  image  interpolation  models  developed  above.  These  include 
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digital  zooming,  primal-sketch  based  perceptual  image  coding,  error  concealment 
for  wireless  image  transmission,  and  progressive  disocclusion  in  computer  vi¬ 
sion  [15].  Extensions  to  color  or  more  general  hyperspectral  images,  and  nonflat 
image  features  (i.e.,  that  live  on  Riemannian  manifolds  [14])  are  also  currently 
being  studied  in  the  literature.  Other  approaches  to  the  inpainting  problem  can  be 
found  in  [3,  5,  8].  In  particular,  it  has  been  just  recently  pointed  out  in  [4]  that 
the  PDE  inpainting  model  in  [5]  is  closely  related  to  the  stream  function-vorticity 
equation  in  fluid  dynamics. 


3  Variational  Level  Set  Image  Segmentation 

Images  are  the  proper  2-D  projections  of  the  3-D  world  containing  various  ob¬ 
jects.  To  successfully  reconstruct  the  3-D  world,  or  at  least  approximately,  the 
first  crucial  step  is  to  identify  the  regions  in  images  that  correspond  to  individual 
objects.  This  is  the  well  known  problem  of  image  segmentation.  It  has  broad 
applications  in  a  variety  of  important  fields  such  as  computer  vision  and  medical 
image  processing. 

Denote  by  uq  an  observed  image  on  a  2-D  Lipschitz  open  and  bounded  do¬ 
main  fl  Segmentation  is  to  find  a  visually  meaningful  edge  set  T  which  leads  to 
a  complete  partition  of  Q.  Each  connected  component  D,  of  fi  \  T  should  corre¬ 
spond  to  at  most  one  real  physical  object  or  pattern  in  our  3-D  world,  for  example, 
the  white  matter  in  brain  images  or  the  abnormal  tissues  in  organs.  In  some  ap¬ 
plications,  one  is  also  interested  in  the  clean  image  patches  vn  on  each  Q,  of  the 
segmentation,  since  u0  is  often  noisy. 

Therefore,  there  are  two  crucial  ingredients  in  the  mathematical  modeling  and 
computation  of  the  segmentation  problem.  The  first  is  how  to  formulate  a  model 
that  appropriately  combines  the  effects  of  both  the  edge  set  T  and  its  segmented 
regions  {ffj,  i  =  1,  2,  •  •  •  }.  And  the  other  is  to  find  the  most  efficient  way  to 
represent  the  geometry  of  both  the  edge  set  and  the  regions,  and  to  represent  the 
segmentation  model  as  a  result.  This  of  course  reflects  the  general  philosophy  in 
the  introduction  section. 

In  the  variational  PDE  approach,  these  two  issues  have  found  good  answers 
in  the  literature.  For  the  first,  one  observes  the  celebrated  segmentation  model  of 
Mumford  and  Shah  [39],  and  the  second,  the  level  set  representation  technology  of 
Osher  and  Sethian  [40].  In  what  follows,  we  detail  our  recent  efforts  in  advancing 
the  application  of  the  level  set  technology  to  various  Mumford-Shah  related  image 
segmentation  models.  Much  of  the  works  can  be  found  in  our  papers  [19,  20,  22, 
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21,  53],  and  also  in  the  related  works  [52,  55,  41,  44,  58]. 

We  start  with  a  novel  active  contour  model  whose  formulation  is  independent 
of  intensity  edges  defined  by  the  gradients,  in  contrast  to  most  conventional  ones 
in  the  literature.  We  then  explain  how  this  model  can  be  efficiently  computed 
based  on  the  multi-phase  level  set  method.  In  the  second  part,  we  extend  these 
results  to  the  level  set  formulation  and  computation  of  the  general  Mumford-Shah 
segmentation  model  for  piecewise  smooth  images.  In  the  last  part,  we  present 
our  recent  work  on  extending  the  previous  models  to  the  logical  operations  on 
multi-channel  image  objects. 

3.1  Active  contours  without  edges  and  multi-phase  level  sets 

Active  contour  is  a  powerful  tool  in  image  and  vision  analysis  for  boundary  detec¬ 
tion  and  object  segmentation.  The  key  idea  is  to  evolve  a  curve  so  that  it  eventually 
stops  along  the  object  edges  of  the  given  image  u0.  The  curve  evolution  is  con¬ 
trolled  by  two  sorts  of  energies:  the  internal  energy  which  defines  the  regularity 
of  the  curve,  and  the  external  one  determined  by  the  given  image  u0.  The  latter  is 
often  called  the  feature-driven  energy. 

In  almost  all  classical  active  contour  models,  the  feature-driven  energies  rely 
heavily  on  the  gradient  feature  \  Vu0  \  or  its  smoothed  version  V G„  *it0  j ,  where  G„ 
denotes  a  Gaussian  kernel  with  a  small  variance  a.  They  work  well  for  detecting 
gradient-defined  edges,  but  fail  for  more  general  classes  of  edges,  such  as  the 
boundary  of  a  nebula  in  some  astronomic  images  or  the  top  image  in  Figure  4. 

Our  new  model  -  active  contours  without  edges ,  first  introduced  in  [19,  20], 
is  formulated  independent  of  the  gradient  information,  and  therefore  can  handle 
more  general  types  of  edges.  The  model  is  to  minimize  the  energy 

E2[ci,  c2,  r|w0]  =  /  \u0(x)  -  Ci\2dx  +  /  \u0(x)  -  C2\2dx  +  u\r\,  (9) 

Jint(T)  Jext(T) 

where  v  >  0,  and  int(T)  and  ext(T)  denote  the  interior  and  exterior  of  T,  and  |T| 
its  length.  The  subscript  2  in  E2  indicates  that  it  deals  with  two-phase  images,  i.e., 
ones  whose  “objects”  can  be  completely  indexed  by  the  interior  and  exterior  of  T. 

In  the  level  set  formulation  of  Osher  and  Sethian  [40],  T  is  embedded  as  the 
zero  level  set  {( j>  =  0}  of  a  Lipschitz  continuous  function  (j>  :  — >  R.  Conse¬ 
quently,  {( j)  >  0}  and  {(j)  <  0}  define  the  interior  Q+  and  exterior  52“  of  the  curve. 
(Computationally,  the  level  set  approach  is  superior  to  other  curve  representations 
in  both  letting  one  directly  work  on  a  fixed  rectangular  grid  and  allowing  au¬ 
tomatic  topological  changes  such  as  merging  and  breaking.)  Denote  by  H  the 
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1-dimensional  Heaviside  function:  H(z)  =  1  if  z  >  0  and  0  if  z  <  0.  Then  the 
energy  in  our  model  becomes 


E2[c!,  c2,  (f)\uo\  =  /  \u0(x)  -  ci\2H{(f))dx  +  /  \u0(x) 

Jn  Jn 

'  [  \VH(</>)\dx. 

Jn 


+  v 


c2|2(1  -  H{4>))dx 


Minimizing  E2  [ci ,  c2 ,  <j)  \  uo]  with  respect  to  c\ ,  c2  and  <p  leads  to  the  Euler-Lagrange 
equation: 


dcf) 

dt 

ci  (t) 


m 


z/div 


(—)- 

MW 

fn  uo(x)H ((f>(x))dx 

fnH(Hx))dx 


Uo  —  Cl  |2  +  \u0  —  c2|2  , 

=  fnu0(x)(  1  -  H(4>(x)))dx 

LI1  -  H(<l>(x)))dx 


with  a  suitable  initial  guess  <^(0,:r)  =  (j)o(x).  In  numerical  implementations,  the 
Heaviside  function  H(z)  is  often  regularized  by  some  He(z )  in  C'1(i?),  so  that 
as  e  — >  0,  the  latter  converges  to  H(z)  in  some  suitable  sense.  As  a  result,  the 
Dirac  function  5{z)  in  the  last  equation  is  regularized  to  S£(z)  =  H'e(z).  We 
have  discovered  in  [20]  that  a  carefully  designed  approximation  scheme  can  even 
allow  interior  contours  to  emerge,  which  has  been  a  challenging  task  for  most 
conventional  algorithms.  Also  notice  that  the  length  term  in  the  energy  has  led  to 
the  mean  curvature  motion. 

The  model  performs  as  an  active  contour  in  the  class  of  piecewise  constant 
images  taking  only  two  values,  looking  for  a  two-phase  segmentation  of  a  given 
image.  The  internal  energy  is  defined  by  the  length,  while  the  external  energy 
is  independent  of  the  gradient  |V«o|-  Defining  the  segmented  image  by  u(x)  = 
ciH(cf)(x ))  +  c2(l  —  H((f)(x))),  we  realize  that  the  energy  model  is  exactly  the 
Mumford-Shah  segmentation  model  [39]  restricted  to  the  class  of  piecewise  con¬ 
stant  images.  However,  our  model  was  initially  developed  from  the  active  contour 
point  of  view. 

Two  typical  numerical  outputs  of  the  model  are  displayed  in  Figure  4.  The  top 
row  shows  that  our  model  can  segment  and  detect  objects  without  clear  gradient 
edges.  The  bottom  one  shows  that  it  can  also  capture  complicated  boundaries  and 
interior  contours. 

For  more  complicated  situations  where  multiple  objects  occlude  each  other 
and  multi-phase  edges  such  as  T-junctions  emerge,  the  above  two-phase  active 
contour  model  becomes  insufficient  and  we  need  to  introduce  more  than  one  level 
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Figure  4:  Top:  detection  of  a  simulated  minefield  by  our  new  active  contour 
model.  Bottom:  segmentation  of  an  MRI  brain  image.  Notice  that  the  interior 
boundaries  are  automatically  detected. 

set  functions.  In  [22,  53],  we  generalize  the  above  framework  to  multi-phase 
active  contours,  or  equivalently,  the  piecewise  constant  Mumford-Shah  segmenta¬ 
tion  with  multi -phase  regions: 

inf  Ems[u,  r|u0]  =  /  \u0  -  Ci\2dx  +  v\T\.  (10) 

u’r  i  Jiii 

Here,  fV s  denote  the  connected  components  of  \  T,  and  u  =  c,  on  H*.  Notice 
that  T  can  now  be  a  general  set  of  edge  curves,  including  for  example  the  T- 
junction  class. 

Generally,  consider  m  level  set  functions  d)t  :  H  — »■  IR.  The  union  of  the  zero- 
level  sets  of  4>i  represents  the  edges  in  the  segmented  image.  Using  these  m  level 
set  functions,  one  can  define  up  to  n  =  2m  phases,  which  form  a  disjoint  and 
complete  partitioning  of  Q.  Therefore,  each  point  x  e  Q  belongs  to  one  and  only 
one  phase.  In  particular,  there  is  no  vacuum  or  overlap  among  the  phases.  This 
is  an  important  advantage,  compared  with  the  classical  multi-phase  representation 
in  [44,  56],  where  a  level  set  function  is  associated  to  each  phase,  and  therefore 
more  level  set  functions  are  needed.  Figure  5  shows  two  typical  examples  of 
multi -phase  partitioning  corresponding  to  m  =  2  and  3. 

We  now  illustrate  the  multi-phase  level  set  approach  through  the  example  of 
n  =  4  and  m  =  2.  Let  c  =  (cn,  Cio,  cqi,  cqo)  denote  a  constant  vector,  and 
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Figure  5:  Left:  two  curves  given  by  fa  =  0  and  fa  =  0,  partition  the  domain  into 
four  regions  based  on  indicator  vector  (sign(^i),  sign(</>2)).  Right:  Three  curves 
given  by  fa  =  0,  fa  =  0,  and  <^3  =  0,  partition  the  domain  into  eight  regions 
based  on  the  triple  (sign(<^1),  sign(</>2),  sign(^3)). 

$  =  (fa,  fa)  the  two-phase  level  set  vector.  Then  we  are  looking  for  an  ideal 
image  u  in  the  form  of 

u  =  cn H(fa)H(fa)  +  c10H(fa)(l  -  H(fa)) 

+coi(l  -  H(fa))H(fa)  +  C00(l  -  Hfa i))(l  -  H(fa)). 

The  Mumford-Shah  segmentation  energy  becomes 

E4[c,fau0]=  /  \u0(x)  -  c11\2H(fa)H(fa)dx 
Jn 

+  f  | uQ{x)  -  c10\2H(fa)(l  -  H(fa))dx 
Jn 

+  [  \u0(x)  -  c0i|2(l  -  H(fa))H(fa)dx  (11) 

Jn 

+  [  I u0(x)  -  C00|2(l  -  H(fa))(l  -  H(fa))dx 
Jn 

[  \VH(fa)\dx  +  v  [  \VH(fa)\dx. 

Jn  Jn 

Its  minimization  leads  to  the  Euler-Lagrange  equations.  First,  with  $  fixed,  the  c 
minimizer  can  be  explicitly  worked  out  as  before: 

Cij(t)  =  average  of  u0  on  {(2 i  —  l)fa  >  0,  (2 j  —  l)fa  >0}  i,j  =  0, 1. 
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In  return,  this  new  c  information  leads  to  the  Euler-Lagrange  equations  for  <!>: 

^t±  =  sfa)  ^iv(y^^j)  -  ((“o  -  cn)2  -  («0  -  c0i)2)ff(^2) 

-  ((«o  -  Go)2  -  («o  -  coo)2)  (1  -  H((f)2))  , 

=  S((f)2 )  z/div(y^j)  -  ((«0  -  cn)2  -  (u0  -  c0i)2)tf(>i) 

-  (Oo  -  cio)2  -  (u0  ~  c00)2)  (1  -  H(<f> i))  . 

Note  that  the  equations  are  governed  by  both  the  mean  curvatures  and  jumps  of 
the  data  energy  terms  across  the  boundary. 

Figure  6  shows  an  application  of  the  model  to  the  medical  analysis  of  a  brain 
image.  Displayed  are  the  final  segmented  image  and  its  associated  four  phases. 
Our  model  successfully  identifies  and  segments  the  white  and  gray  matters. 

Recently  the  above  models  and  algorithms  have  been  extended  to  multi-channel, 
volumetric,  and  texture  images  in  [12,  13,  46].  Let  us  give  a  little  more  de¬ 
tails  about  texture  segmentation  from  [46].  Texture  images  refer  to  general  im¬ 
ages  of  natural  scenes,  such  as  grasslands,  beaches,  rocks,  mountains,  and  human 
body  tissues.  They  typically  carry  certain  coherent  structures  in  scales,  orienta¬ 
tions,  and  local  frequencies.  To  segment  texture  images  using  the  above  models, 
we  first  apply  Gabor’s  filters  to  extract  these  coherent  structures.  The  filter  re¬ 
sponses  create  a  new  vectorial  (or  multi-channel)  feature  image  in  the  form  of 
U(x )  =  (va(x).  up (x).  ■  ■  ■  ,  u7(x)),  where  the  Greek  letters  stand  for  the  filter 
signatures  and  typically  each  takes  a  value  of  (scale,  orientation,  local  frequency). 
We  then  apply  the  vectorial  active-contour-without-edges  model  to  the  segmenta¬ 
tion  of  U.  Figure  7  shows  one  typical  example. 

3.2  Piecewise  smooth  Mumford-Shah  segmentation 

The  most  general  Mumford-Shah  piecewise  smooth  segmentation  [39]  is  defined 
by 

2dx  +  n  I  | Vu\2dx  +  z/jT|,  (12) 
Jn\r 

where  [i.  v  are  positive  parameters.  It  allows  the  segmented  “objects”  to  have 
smoothly  varying  intensities,  instead  of  being  strictly  constant.  We  now  show 


inf  Ems[u,T\u0] 


J  \u~ 
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Initial 


Segmented 


Figure  6:  The  original  and  segmented  images  (top  row),  and  the  final  four  seg¬ 
ments  (the  rest). 


21 


Figure  7:  An  example  of  texture  segmentation  (at  increasing  times). 


how  to  carry  out  the  model  based  on  the  multi-phase  level  set  approach  [20].  As 
before,  we  start  with  the  two-phase  situation  where  a  single  level  set  function  (f)  is 
sufficient,  followed  by  the  more  general  multi-phase  case. 

In  the  two-phase  situation,  the  ideal  image  u  is  segmented  to  u ±  by  the  level 
set  function  < p: 


u(x )  =  u+(x)H(cf)(x))  +  u  (:r)(l  —  H{(f>{x))). 


We  assume  that  both  u+  and  u  are  C1  functions  up  to  the  boundary  {<^>  =  0}. 
Substituting  this  expression  into  (12),  we  obtain 


E[u+,u  ,(f)\uo]=  /  \u+  —  uo\2H(cj))dx  +  /  | u  —  u0\2(l  —  H((f>))dx 

Jn  J  n 

+H  [  \Vu+\2H{(f)))dx  +  /j,  [ \Vu~\2{l  -H{(f)))dx  +  u  [ \VH((/>)\. 

Jn  Jn  Jn 


(13) 


First  with  <f>  fixed,  the  variation  on  E[u+,u  ,  (j)\uo\  leads  to  the  two  Euler- 
Lagrange  equations  for  separately: 

u±  —  uq  =  /iA u±  on  ±  (j)  >  0,  — —  =  0  on  {(f)  =  0}.  (14) 

on 

(Here  ±  takes  either  +  or  — ,  but  uniformly  across  the  formula.)  They  act  as  the 
denoising  operators  on  the  homogeneous  regions  only.  Notice  that  no  smoothing 
is  done  across  the  boundary  {(f)  =  0},  which  is  very  important  in  image  analysis. 
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Next,  keeping  the  functions  u+  and  u  fixed,  and  minimizing  E  [u+ ,  u  ,  d>  j  ?/,0] 
with  respect  to  <f>,  we  obtain  the  motion  of  the  zero  level  set: 


d(f) 

dt 


m 


-  (I“H 


«o!2  +  A4 1  Vw+|2 


U  -  M0|2  -  ju|Vu  |2)  , 


with  some  initial  guess  (j)(t  =  0,  x).  The  above  equation  is  actually  computed  at 
least  near  a  narrow  band  of  the  zero  level  set.  As  a  result,  computationally,  we  have 
to  continuously  extend  both  u+  and  u~  from  their  original  domains  {±<p  >  0}  to 
a  suitable  neighborhood  of  the  zero  level  set  {< j>  =  0}.  Figure  8  displays  an 
application  of  the  model  in  astronomical  image  analysis.  Although  the  nebula 
itself  does  not  seem  to  be  a  smooth  object,  the  piesewise  smooth  model  can  still 
correctly  capture  all  the  main  features. 


Figure  8:  Numerical  result  from  the  piecewise  smooth  Mumford-Shah  level  set 
algorithm  with  one  level  set  function. 

As  in  the  previous  section,  there  are  cases  where  the  boundaries  forming  a 
complete  partition  of  the  image  cannot  be  represented  by  a  single  level  set  func¬ 
tion.  Then  one  has  to  turn  to  the  multi-phase  approach.  In  our  papers,  thanks  to 
the  planer  Four  Color  Theorem,  we  have  been  able  to  conclude  that  two  level  set 
functions  are  sufficient  for  any  multi -phase  partition  problems. 

By  the  Four  Color  Theorem,  one  can  color  all  the  regions  in  a  partition  using 
only  four  colors,  so  that  any  two  adjacent  regions  are  color  distinguishable.  Iden- 
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tifying  a  phase  with  one  color,  we  see  that  two  level  set  functions  (f> i  and  <f>2  are 
sufficient  to  produce  four  “colors:”  {±<pi  >  0,  ±o2  >  0}.  Therefore,  they  can 
completely  segment  a  general  image  with  a  multi-phase  boundary  set  T  given  by 
{(f) i  =  0}  or  {(f) 2  =  0}.  As  before,  we  do  not  have  the  problems  of  “overlapping” 
or  “vacuum”  as  in  [56,  44].  Note  that  by  this  formulation,  generally  each  “color” 
can  still  have  many  isolated  components.  Therefore,  the  segmentation  is  complete 
only  after  one  applies  an  extra  step  of  the  well  known  topological  processor  for 
finding  the  connected  components  of  an  open  set. 

In  this  four-phase  formulation,  the  ideal  image  u  is  segmented  into  four  dis¬ 
joint  but  complete  parts  u±:k,  each  defined  by  one  of  the  four  phases: 

{±<^i  >  0,  ±(f)2  >  0}. 

Overall,  by  using  the  Heaviside  function,  we  obtain  the  following  synthesis  for¬ 
mula: 


u  =  u++if  (&)#(&)  +  1  -  H(<h)) 

+u~+(  1  -  +  u—{  1  -  # (<M)(1  - 

for  all  x  6  C2.  We  can  express  in  a  similar  way  the  energy  function  of  u  and 
$  =  ((f) i,  (f)2),  and  derive  the  corresponding  Euler-Lagrange  equations. 

Notice  the  remarkable  feature  of  this  single  model,  which  includes  both  the 
original  energy  formulation  and  the  elliptic  and  evolutionary  PDEs:  it  naturally 
combines  all  the  three  image  processors  -  active  contour,  segmentation,  and  de- 
noising. 

3.3  Logic  operators  for  multi-channel  image  segmentation 

In  a  multi-channel  image  u(x)  =  (ui(x),  u2(x),  •  •  •  ,  un(x)),  a  single  physical  ob¬ 
ject  can  leave  different  traces  in  different  channels.  For  example,  Figure  9  shows 
a  two-channel  image  containing  a  triangle  which  is  however  incomplete  in  each 
individual  channel.  For  this  example,  most  conventional  segmentation  models  for 
multi-channel  images  [12,  13,  32,  47,  49,  58]  would  output  the  complete  triangle, 
i.e.,  the  union  of  both  channels.  The  union  is  just  one  of  the  several  possible  log¬ 
ical  operations  for  multi-channel  images.  For  example,  the  intersection  and  the 
differentiation  are  also  very  common  in  applications,  as  illustrated  in  Figure  10. 

In  this  section,  we  outline  our  recent  efforts  in  developing  logical  segmenta¬ 
tion  schemes  for  multi-channel  images  based  on  the  active-contour-without-edges 
model  [45]. 
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Figure  9:  A  synthetic  example  of  an  object  in  two  different  channels.  Notice  that 
the  lower  left  corner  of  A1  and  the  upper  corner  of  A2  are  missing. 

A i  u  a2  Ai  n  A2  Ai  n  ~'A2 

▲  4. 


Figure  10:  Different  logical  combinations  for  the  sample  image:  the  union,  the 
intersection,  and  the  differentiation. 


First,  we  define  two  logical  variables  to  encode  the  information  inside  and 
outside  the  contour  T  separately  for  each  channel  i: 


4>Ur)  =  j  J 
z“'K,x,r)  =  |  J 


if  x  is  inside  T  and  not  on  the  object, 
otherwise; 


if  x  is  outside  T  and  on  the  object, 
otherwise. 


Such  different  treatments  are  motivated  by  the  energy  minimization  formulation. 
Intuitively  speaking,  in  order  for  the  active  coutour  T  to  evolve  and  eventually 
capture  the  exact  boundary  of  the  targeted  logical  object,  the  energy  should  be 
designed  so  that  both  partial  capture  and  over  capture  lead  to  high  energies  (corre¬ 
sponding  to  z°ut  =  1  and  z™  =  1  separately).  Imagine  that  the  target  object  is  the 
tumor  tissue,  then  in  terms  of  decision  theory,  over  and  partial  captures  correspond 
to  false  alarms  and  misses  separately.  Both  are  to  be  penalized. 

In  practice,  we  do  not  have  the  precise  information  of  “the  object,”  which  is 
exactly  to  be  segmented.  One  possible  way  to  approximate  and  z°ut  is  based 
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Truth  table  for  the  two-channel  case 


~in 

Z1 

~in 

z2 

„out 

Z1 

„out 

z2 

A1  U  A  2 

Ai  n  a-2 

A\  n  A2 

x  inside  T 
(or  x  G  fl+) 

l 

l 

0 

0 

1 

i 

1 

l 

0 

0 

0 

0 

l 

1 

0 

i 

0 

0 

0 

l 

0 

0 

0 

0 

0 

0 

0 

1 

x  outside  T 
(or  x  G  f 

0 

0 

1 

1 

1 

1 

0 

0 

0 

1 

0 

1 

0 

1 
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0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Table  2:  The  truth  table  for  two  channels.  Notice  that  inside  T  “true”  is  repre¬ 
sented  by  0.  It  is  so  designed  to  encourage  the  contour  to  enclose  the  targeted 
logical  object  at  a  lower  energy  cost. 


on  the  interior  (fl+)  and  exterior  (fl  )  averages  cf  in  channel  i: 


zr«x,r) 


_ Uq(x)  -4 12 

max2/en+  \ul(y)  -  cj I2’ 
-  c~|2 

max^en-  |«o (v)  ~  ci  I2’ 


for  x  G  fi+; 
for  x  G  . 


The  desired  truth  table  can  then  be  described  using  the  z™’ s  and  z°ut’ s.  In 
Table  2,  we  have  shown  three  examples  of  logical  operations  for  the  two-channel 
case.  Notice  that  “true”  is  represented  by  0  inside  T.  It  is  so  designed  to  encourage 
energy  minimization  when  the  contour  tries  to  capture  the  targeted  object  inside. 

We  then  design  continuous  objective  functions  to  smoothly  interpolate  the  bi¬ 
nary  truth  table.  This  is  because  in  practice,  as  mentioned  above,  the  z’s  are 
approximated  and  take  continuous  values.  For  example,  the  possible  interpolants 
for  the  union  and  intersection  can  be: 


/AilMaOc)  =  \Jz\n(x)z™(x)  +  (1  -  y/(l  ~  ^(x))(  1  -  Z™\x))), 

fA,nA2{.x)  =  1  -  l-  Zf(x))(l  -  4n(X>)  +  VzlUt(x)z?2Ut(x). 

The  square  roots  are  taken  to  keep  them  of  the  same  order  as  the  original  scalar 
models.  It  is  straightforward  to  extend  the  two-channel  case  to  more  general  71- 
channel  ones. 
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The  energy  functional  E  for  the  logical  objective  function  /  can  be  expressed 
by  the  level  set  function  <f>.  Generally,  as  just  shown  above,  the  objective  function 
can  be  separated  into  two  parts, 


/  =  f(4n 


„ out 


C.O  =  /*.(*”■-■  ,<")  +  /«*(*! 


out 


v out 


)• 


The  energy  functional  is  then  defined  by 

E[(j)\c+,  c~]  =  /rlength(</>  =  0)  +  A  f  [, fin(z\n ,  •  •  •  , 

Jn 

+u(zr\  -  ■  ■  -  Hm\dx. 

Here  each  c1*1  =  (cf.  ■  ■  ■  .  c^)  is  in  fact  a  multi-channel  vector.  The  associated 
Euler-Lagrange  equation  is  similar  to  the  scalar  model: 


dcj) 

dt 


m  [/idiy(||^)  -  a  sax*,  •  •  • .  c‘» 


|V0|. 


with  suitable  boundary  conditions  as  before.  Even  though  the  form  often  looks 
complicated  for  a  typical  application,  its  implementation  is  very  similar  to  that  of 
the  scalar  model. 

Numerical  results  support  our  above  efforts.  Figure  9  shows  two  different 
occlusions  of  a  triangle.  We  are  able  to  successfully  recover  the  union,  the  inter¬ 
section,  and  the  differentiation  of  the  objects  in  Figure  10  using  our  model.  In 
Figure  11,  we  have  a  two-channel  image  of  the  brain.  In  one  we  have  a  “tumor” 
with  some  noise,  while  the  other  is  clear.  The  images  are  not  registered.  We 
want  to  find  A±  n  ~'A2  so  that  the  tumor  can  be  observed.  This  happens  to  be  a 
very  complicated  example  as  there  are  a  lot  of  features  and  textures.  However  the 
model  finds  the  tumor  successfully. 
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